Skip to content

Fix read_vrt leaking integer source nodata in float-dataType VRT (#1616)#1621

Merged
brendancol merged 2 commits into
mainfrom
deep-sweep-metadata-geotiff-2026-05-11-a8be2e54
May 11, 2026
Merged

Fix read_vrt leaking integer source nodata in float-dataType VRT (#1616)#1621
brendancol merged 2 commits into
mainfrom
deep-sweep-metadata-geotiff-2026-05-11-a8be2e54

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Summary

  • Fixes read_vrt drops integer-source nodata when VRT declares a float dataType #1616 -- a VRT declaring dataType="Float32" or "Float64" over integer source GeoTIFFs leaked the GDAL_NODATA sentinel as a literal float into the returned array (e.g. 65535.0 instead of np.nan). attrs['nodata'] advertised the sentinel but NaN-aware downstream code saw the sentinel pixels as valid data.
  • _vrt.read_vrt now NaN-masks integer source arrays before they're placed into a float result buffer, gated on the sentinel being representable in the source dtype so out-of-range values stay no-op rather than tripping OverflowError.
  • Brings the int-source + float-dataType combo in line with the int-source + int-dataType case (already handled by the outer __init__.py promotion) and the float-source case (already handled inline by _vrt.read_vrt).

Test plan

  • New test_vrt_int_source_float_dtype_1616.py covers Float32 + uint16, Float64 + int16, the out-of-range-sentinel no-op, the no-pixel-match case, dask wrapper, attrs['nodata'] round-trip, and per-band band=N selection (7 tests, all pass).
  • All existing VRT tests pass (test_vrt_band_nodata_1598, test_vrt_int_nodata_1564, test_vrt_multiband_int_nodata_1611, test_vrt_write, test_vrt_tiled_metadata_1606, test_vrt_xml_escape_1607).
  • Full geotiff suite passes apart from two pre-existing TestPalette matplotlib deepcopy failures (unrelated to this PR).

A VRT declaring ``dataType="Float32"`` or ``"Float64"`` over integer
source GeoTIFFs used to leak the GDAL_NODATA sentinel value as a
literal float (e.g. 65535.0) into the returned array. ``attrs['nodata']``
advertised the sentinel but NaN-aware downstream code (``np.isnan``,
``xr.where``, slope/aspect, statistics) saw the sentinel pixels as
valid data and produced wrong answers.

The eager numpy, dask, and GPU paths all funnel through ``_vrt.read_vrt``,
which only NaN-masked source arrays whose dtype was already float. Mask
integer source arrays too when the result buffer is float, gated on the
sentinel being representable in the source dtype so out-of-range values
(uint16 file + ``NoDataValue=-9999``) stay no-op rather than tripping
``OverflowError``.

Single-file integer rasters already get the sentinel-to-NaN promotion
from the wrappers in ``__init__.py``; this fix brings the VRT path in
line with that semantics for the int-source + float-dataType combo.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 11, 2026
@brendancol brendancol requested a review from Copilot May 11, 2026 20:24
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a VRT-reading nodata masking bug where integer-source GDAL_NODATA sentinels leaked into float-typed VRT outputs as literal finite floats (e.g., 65535.0) instead of being converted to NaN, which broke downstream NaN-aware processing.

Changes:

  • Add an integer-source + float-VRT nodata masking branch in xrspatial/geotiff/_vrt.py to prevent sentinel leakage.
  • Add a dedicated regression test module covering multiple dtype/sentinel scenarios, dask wrapping, and per-band selection.
  • Update internal .claude/sweep-metadata-state.csv tracking entry for issue #1616.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
xrspatial/geotiff/_vrt.py Adds masking logic for integer-source arrays when the VRT declares a floating output dtype.
xrspatial/geotiff/tests/test_vrt_int_source_float_dtype_1616.py Adds regression coverage for the int-source + float-VRT nodata leak, including dask and band selection.
.claude/sweep-metadata-state.csv Updates metadata tracking to reflect issue #1616.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread xrspatial/geotiff/_vrt.py
Comment on lines +399 to +403
sentinel = src_arr.dtype.type(nodata_int)
mask = src_arr == sentinel
if mask.any():
src_arr = src_arr.astype(result.dtype)
src_arr[mask] = result.dtype.type('nan')
Comment thread xrspatial/geotiff/tests/test_vrt_int_source_float_dtype_1616.py Outdated
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
@brendancol brendancol merged commit 9a5f55e into main May 11, 2026
10 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

read_vrt drops integer-source nodata when VRT declares a float dataType

2 participants